aa wrapper and KLU migration to PNM for PFs#302
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Replaces the optional AppleAccelerate.jl weak dependency / package extension with an in-tree AccelerateWrapper submodule that binds directly to libSparse.dylib on macOS. The wrapper exposes a cached symbolic+numeric LDLT factorization, in-place dense/sparse-RHS solves, and SpMM/SpMV; PNM's PTDF/LODF/VirtualPTDF code paths are migrated from AAFactorization to the new AAFactorCache. Non-Apple builds get a stub module that errors on use, removing the runtime extension gating.
Changes:
- New
src/AccelerateWrapper/submodule with libSparse ccalls, a symbolic/numeric factor cache, and dense/sparse solve + SpMM bindings (macOS-gated; stubs elsewhere). - Inlined
_calculate_PTDF_matrix_AppleAccelerate/_calculate_LODF_matrix_AppleAccelerateintosrc/, swapped_solve_factorization/_create_factorization/with_solverto dispatch onAAFactorCache, and replaced_has_apple_accelerate_ext()with_has_apple_accelerate_backend() = Sys.isapple(). - Removed
ext/AppleAccelerateExt.jl, theAppleAccelerateweakdep/compat entries, theruntests.jlinstall step, and the corresponding Aqua stale-dep ignore; addedtest/test_accelerate_wrapper.jl.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/AccelerateWrapper/AccelerateWrapper.jl |
Submodule entry; macOS-gated includes plus non-Apple stubs. |
src/AccelerateWrapper/libsparse_bindings.jl |
C struct layouts, enums, and mangled @ccall bindings into libSparse. |
src/AccelerateWrapper/aa_cache.jl |
AAFactorCache lifecycle: lower-triangle pattern, symbolic/numeric (re)factor, finalizer. |
src/AccelerateWrapper/solve_dense.jl |
In-place dense vector/matrix solves and \ overload. |
src/AccelerateWrapper/solve_sparse_rhs.jl |
Block-packed sparse-RHS solver with reusable per-cache scratch. |
src/AccelerateWrapper/spmm.jl |
aa_spmm! / aa_spmv! bindings with per-call CSC→Apple index translation. |
src/PowerNetworkMatrices.jl |
Includes the wrapper, imports its symbols, drops AppleAccelerate forward decls. |
src/linalg_settings.jl |
Drops the extension probe; adds _has_apple_accelerate_backend and updates check_linalg_backend. |
src/solver_dispatch.jl |
Adds with_solver overload specialized on AAFactorCache; updates docstring. |
src/ptdf_calculations.jl |
Inlines PTDF AppleAccelerate path using AccelerateWrapper. |
src/lodf_calculations.jl |
Inlines LODF AppleAccelerate path using AccelerateWrapper. |
src/virtual_ptdf_calculations.jl |
Switches _solve_factorization to typed AAFactorCache overload; updates docs. |
ext/AppleAccelerateExt.jl |
Removed (logic moved into src/). |
Project.toml |
Drops AppleAccelerate weakdep, extension entry, and compat bound. |
test/PowerNetworkMatricesTests.jl |
Removes :AppleAccelerate from Aqua stale-dep ignore list. |
test/runtests.jl |
Drops the Pkg.add("AppleAccelerate") branch and updates the comment. |
test/test_accelerate_wrapper.jl |
New unit tests for AAFactorCache, dispatch, with_solver, and KLU parity. |
test/test_ptdf.jl, test/test_lodf.jl, test/test_virtual_ptdf.jl, test/test_powerflow_matrix_types.jl, test/test_network_modification.jl |
Switch guards to _has_apple_accelerate_backend, update messages, and update expected factor type to AAFactorCache. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| solve!(cache, B) -> B | ||
|
|
||
| Solve `A · X = B` in place. `B::StridedVecOrMat{Float64}` must have | ||
| first-dimension size equal to `cache.n` and unit stride in the first | ||
| dimension. Multiple columns of `B` are handled in a single libSparse call. | ||
| """ | ||
| function solve!(cache::AAFactorCache, B::StridedMatrix{Cdouble}) |
| This solver is only available on macOS. | ||
| Install AppleAccelerate: | ||
| julia> using Pkg; Pkg.add(\"AppleAccelerate\")""" | ||
| _has_apple_accelerate_backend() = Sys.isapple() |
| # Called by libSparse on failure with a null-terminated C string; we surface | ||
| # the message via `@error`. Must be a top-level function so `@cfunction` can | ||
| # resolve it. | ||
| function _libsparse_report_error(msg::Cstring) | ||
| s = unsafe_string(msg) | ||
| @error "libSparse reported an error" message = s | ||
| return nothing | ||
| end | ||
|
|
||
| # `reportError` is fired by libSparse before it returns a failure status — | ||
| # we log the message so it ends up in user output rather than libSparse's | ||
| # own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple |
There was a problem hiding this comment.
Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.
| @@ -383,7 +410,7 @@ efficient when the prerequisite matrices with factorization are already availabl | |||
| - `BA::BA_Matrix`: The branch susceptance weighted incidence matrix (B * A) | |||
|
|
|||
| # Keyword Arguments | |||
| - `linear_solver::String = "KLU"`: | |||
| - `linear_solver::String = _default_linear_solver()`: | |||
| Linear solver algorithm for matrix computations. Currently only "KLU" is supported | |||
| col_start = pos | ||
| for p in nzrange(A, j) | ||
| if rowval[p] >= j | ||
| pos += 1 | ||
| pos > cache.nnz_tri && return _pattern_mismatch(op) | ||
| cache.rowIndices[pos] == Cint(rowval[p] - 1) || | ||
| return _pattern_mismatch(op) | ||
| end | ||
| end | ||
| cache.columnStarts[j + 1] == Clong(pos) || | ||
| return _pattern_mismatch(op) | ||
| # silence unused-warning on col_start | ||
| col_start === col_start |
Performance ResultsPrecompile Time
Execution Time
|
luke-kiernan
left a comment
There was a problem hiding this comment.
Big picture question: what's the motivation or goal here? Yeah, using the public Julia package has its issues and limitations, but it's also modular and maintained in tandem with the _jll binary. So I'd like to hear your reasoning here.
If there's specific shortcomings of these libraries, I'm willing to go open a PR. I expect it wouldn't get merged for a while (~months), but at least then we wouldn't need to maintain our own low-level bindings indefinitely.
My other question: why is AA no longer an extension?
| factorization_type::SparseFactorization_t = SparseFactorizationLDLT, | ||
| ) | ||
| n = size(A, 1) | ||
| n == size(A, 2) || throw(DimensionMismatch("matrix must be square; got $(size(A))")) |
There was a problem hiding this comment.
If the user tries to apply a symmetric factorization to a non symmetric matrix, this will silently take the lower triangle and factorize a different matrix...potentially hazardous. I'd at least check that the pattern is symmetric, with a flag to bypass.
| Release the libSparse numeric and symbolic handles held by `cache`, leaving | ||
| Julia-side state intact. Idempotent. | ||
| """ | ||
| function _free_handles!(cache::AAFactorCache) |
There was a problem hiding this comment.
And if it isn't SparseStatusOk? Leaving it around seems iffy. Unlikely to cause issues in practice, but could clog up the heap with un-GCed objects in theory.
| # Called by libSparse on failure with a null-terminated C string; we surface | ||
| # the message via `@error`. Must be a top-level function so `@cfunction` can | ||
| # resolve it. | ||
| function _libsparse_report_error(msg::Cstring) | ||
| s = unsafe_string(msg) | ||
| @error "libSparse reported an error" message = s | ||
| return nothing | ||
| end | ||
|
|
||
| # `reportError` is fired by libSparse before it returns a failure status — | ||
| # we log the message so it ends up in user output rather than libSparse's | ||
| # own stderr. Passing libc malloc/free explicitly (the C_NULL "use Apple |
There was a problem hiding this comment.
Seconded. Looking at AppleAccelerate.jl, I see instead: @cfunction(text->error(unsafe_string(text)), Cvoid, (Cstring, )). Probably don't need to go that low-level, but @error seems risky.
| # Status codes and shared error helper | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| const KLU_OK = 0 |
There was a problem hiding this comment.
nitpick: int backed enum would be more stylistic.
| # Keyword Arguments | ||
| - `linear_solver::String = "KLU"`: | ||
| Linear solver algorithm for matrix computations. Currently only "KLU" is supported | ||
| This constructor is intentionally KLU-only because `ABA.K` is always a |
There was a problem hiding this comment.
Reasoning behind this decision? Which matrices are AA-compatible and why?
In short, I am trying reduce the number of external dependencies under which we have no control specially around the linear solvers and expose only the code that we need. I have been frustrated with the speed at which the other libraries respond and also at the exposure of mistakes there like it happened with the solve! call. Another reason specially for the linear solvers is to have cohesive infrastructure for PNM and PFs. |
josephmckinsey
left a comment
There was a problem hiding this comment.
I'm mainly confused as to why we are making so many changes to KLU immediately again. I haven't dug into the Apple Accelerate part yet.
There was a problem hiding this comment.
What feature is this used for? I don't actually see any specific code for KLU, so it seems like it could potentially be used for any LinearAlgebra.Factorization.
There was a problem hiding this comment.
I'm a bit confused why we are wrapping this. KLU's internals weren't very thread-safe, which made it a clearer case until that can be fixed upstream, but I thought AppleAccelerate.jl was a bit better now?
Maybe in not the cleanest way I layered the changes needed in KLU here so we can remove the use of KLU.jl in PowerFlows and use our own wrapper. In PFs we use Int32 |
| arc_keys = spread_keys(PNM.get_arc_axis(vptdf_ref), N_ROWS) | ||
| sample = function () | ||
| v = PNM.VirtualPTDF(sys; linear_solver = solver) | ||
| GC.gc() |
There was a problem hiding this comment.
Why the explicit call to garbage collection here? Elsewhere in this file too.
| end | ||
| _populate_lower_triangle_pattern!(cache, A) | ||
| _populate_pattern!(cache, A) | ||
| sym = _sparse_symbolic_factor( |
There was a problem hiding this comment.
nitpick: maybe rename this variable to something other than sym now that it's not a symmetric matrix.
| # Workspace-aware overload — caller-supplied scratch avoids a per-call | ||
| # malloc/free inside libSparse. | ||
| ws = _ensure_solve_workspace!(cache, size(B, 2)) | ||
| GC.@preserve cache _sparse_solve_matrix_ws!(cache.numeric, _dense_matrix(B), ws) |
There was a problem hiding this comment.
Is the GC.@preserve here precautionary or necessary?
| Tv = eltype(X) | ||
| fill!(X, zero(Tv)) |
There was a problem hiding this comment.
Nitpick: pretty sure just fill!(X, 0.0) will work. And if not, then introduce Tv as a type parameter.
| # libSparse rejects factorization type 80. | ||
| const _AA_MIN_MACOS = v"15.5" | ||
|
|
||
| # Query the running macOS product version via the `kern.osproductversion` |
There was a problem hiding this comment.
AppleAccerelate.jl has a very similar function. Could just copy-paste their implementation. [Or do you have reasons for not using theirs?]
| VirtualPTDF / VirtualLODF / VirtualMODF constructors. | ||
| """ | ||
| _default_linear_solver() = Sys.isapple() ? "AppleAccelerate" : "KLU" | ||
| function _default_linear_solver() |
There was a problem hiding this comment.
Nitpick: might belong better in __init__ or in a similar compile-time "run just once" spot.
| _get_PTDF_A_diag(K, BA, A, ref_bus_positions) -> Vector{Float64} | ||
|
|
||
| Compute `diag(PTDF · A)`. Each row of `A` has exactly two nonzeros (+1 at the | ||
| from-bus, -1 at the to-bus), so the per-arc dot product reduces to two indexed | ||
| reads into the solved PTDF row after a one-time transpose of `A`. |
There was a problem hiding this comment.
Pre-existing issue, but: is there really no better way to do this? I have a sense of deja vu, as if I've gone down this path before and it went nowhere.
edit: okay this has lower memory usage than pre-computing the entirety of K^{-1} (which could be dense). I'm guessing that's why we pick this method.
This PR provides usage of the AA framework in the rest of the matrices and changes to KLU to be used later in PowerFlows to drop completely KLU.jl dependency and pass onto PFs the protection against the pointer problem and will close Sienna-Platform/PowerFlows.jl#107